Fellow in Quantitative Methodology
London School of Economics
10 minute break
1 hour lunch
10 minute break
Introduction to Large Language Models
Attention Mechanism
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("Hello I'm a [MASK] model.")
[{'sequence': "[CLS] hello i'm a fashion model. [SEP]",
'score': 0.1073106899857521,
'token': 4827,
'token_str': 'fashion'},
{'sequence': "[CLS] hello i'm a role model. [SEP]",
'score': 0.08774490654468536,
'token': 2535,
'token_str': 'role'},
{'sequence': "[CLS] hello i'm a new model. [SEP]",
'score': 0.05338378623127937,
'token': 2047,
'token_str': 'new'},
{'sequence': "[CLS] hello i'm a super model. [SEP]",
'score': 0.04667217284440994,
'token': 3565,
'token_str': 'super'},
{'sequence': "[CLS] hello i'm a fine model. [SEP]",
'score': 0.027095865458250046,
'token': 2986,
'token_str': 'fine'}] Confusion Matrix
Mining causality: Ai-assisted search for instrumental variables. Han (2024)
Han (2024)
Introducing an Interpretable Deep Learning Approach to Domain-Specific Dictionary Creation: A Use Case for Conflict Prediction Häffner et al. (2023)
We train the neural networks on a corpus of conflict reports and match them with conflict event data. This corpus consists of over 14,000 expert-written International Crisis Group (ICG) CrisisWatch reports between 2003 and 2021
Predicting Conflict Intensity
Evaluating the persuasive influence of political microtargeting with large language models Hackenburg & Margetts (2024)
[…] we integrate user data into GPT-4 prompts in real-time, facilitating the live creation of messages tailored to persuade individual users on political issues. We then deploy this application at scale to test whether personalized, microtargeted messaging offers a persuasive advantage compared to nontargeted messaging.
Microtargeting and Political Persuasion
Synthetically generated text for supervised text analysis Halterman (2025)
This article proposes using LLMs to generate synthetic training data for training smaller, traditional supervised text models.
Synthetic Data Generation
Positioning Political Texts with Large Language Models by Asking and Averaging Le Mens & Gallego (2025)
We ask an LLM where a tweet or a sentence of a political text stands on the focal dimension and take the average of the LLM responses to position political actors such as US Senators, or longer texts such as UK party manifestos or EU policy speeches given in 10 different languages.
Scaling Political Texts
Not a panacea
Bisbee et al. (2023)